Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 25652 |
| Missing cells | 64490 |
| Missing cells (%) | 11.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 4.3 MiB |
| Average record size in memory | 176.0 B |
Variable types
| NUM | 11 |
|---|---|
| CAT | 6 |
| BOOL | 5 |
locality has a high cardinality: 867 distinct values | High cardinality |
property_subtype has a high cardinality: 178 distinct values | High cardinality |
Unnamed: 0 is highly correlated with df_index | High correlation |
df_index is highly correlated with Unnamed: 0 | High correlation |
area has 1786 (7.0%) missing values | Missing |
kitchen_has has 2187 (8.5%) missing values | Missing |
furnished has 2832 (11.0%) missing values | Missing |
open_fire has 2658 (10.4%) missing values | Missing |
terrace has 6534 (25.5%) missing values | Missing |
terrace_area has 11427 (44.5%) missing values | Missing |
garden has 3685 (14.4%) missing values | Missing |
garden_area has 10074 (39.3%) missing values | Missing |
land_surface has 6269 (24.4%) missing values | Missing |
land_plot_surface has 8371 (32.6%) missing values | Missing |
facades_number has 5580 (21.8%) missing values | Missing |
swimming_pool_has has 2971 (11.6%) missing values | Missing |
rooms_number is highly skewed (γ1 = 26.40591264) | Skewed |
area is highly skewed (γ1 = 67.92246901) | Skewed |
terrace_area is highly skewed (γ1 = 50.34765983) | Skewed |
garden is highly skewed (γ1 = 95.45319567) | Skewed |
garden_area is highly skewed (γ1 = 26.60671719) | Skewed |
land_surface is highly skewed (γ1 = 113.7686367) | Skewed |
land_plot_surface is highly skewed (γ1 = 38.43006696) | Skewed |
df_index has unique values | Unique |
Unnamed: 0 has unique values | Unique |
rooms_number has 601 (2.3%) zeros | Zeros |
area has 983 (3.8%) zeros | Zeros |
terrace_area has 6711 (26.2%) zeros | Zeros |
garden has 15743 (61.4%) zeros | Zeros |
garden_area has 11761 (45.8%) zeros | Zeros |
land_surface has 11203 (43.7%) zeros | Zeros |
land_plot_surface has 911 (3.6%) zeros | Zeros |
facades_number has 8610 (33.6%) zeros | Zeros |
Reproduction
| Analysis started | 2020-11-19 10:34:05.092720 |
|---|---|
| Analysis finished | 2020-11-19 10:34:49.601441 |
| Duration | 44.51 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 25652 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25616.43735 |
|---|---|
| Minimum | 0 |
| Maximum | 51302 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2580.55 |
| Q1 | 12796.5 |
| median | 25497.5 |
| Q3 | 38510.25 |
| 95-th percentile | 48763.9 |
| Maximum | 51302 |
| Range | 51302 |
| Interquartile range (IQR) | 25713.75 |
Descriptive statistics
| Standard deviation | 14827.02187 |
|---|---|
| Coefficient of variation (CV) | 0.5788088979 |
| Kurtosis | -1.202410756 |
| Mean | 25616.43735 |
| Median Absolute Deviation (MAD) | 12861 |
| Skewness | 0.006107397401 |
| Sum | 657112851 |
| Variance | 219840577.6 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 10235 | 1 | < 0.1% | |
| 13644 | 1 | < 0.1% | |
| 46396 | 1 | < 0.1% | |
| 48445 | 1 | < 0.1% | |
| 42302 | 1 | < 0.1% | |
| 44351 | 1 | < 0.1% | |
| 21824 | 1 | < 0.1% | |
| 17730 | 1 | < 0.1% | |
| 30020 | 1 | < 0.1% | |
| 27975 | 1 | < 0.1% | |
| Other values (25642) | 25642 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 51302 | 1 | < 0.1% | |
| 51301 | 1 | < 0.1% | |
| 51300 | 1 | < 0.1% | |
| 51299 | 1 | < 0.1% | |
| 51298 | 1 | < 0.1% |
| Distinct | 25652 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25616.43735 |
|---|---|
| Minimum | 0 |
| Maximum | 51302 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2580.55 |
| Q1 | 12796.5 |
| median | 25497.5 |
| Q3 | 38510.25 |
| 95-th percentile | 48763.9 |
| Maximum | 51302 |
| Range | 51302 |
| Interquartile range (IQR) | 25713.75 |
Descriptive statistics
| Standard deviation | 14827.02187 |
|---|---|
| Coefficient of variation (CV) | 0.5788088979 |
| Kurtosis | -1.202410756 |
| Mean | 25616.43735 |
| Median Absolute Deviation (MAD) | 12861 |
| Skewness | 0.006107397401 |
| Sum | 657112851 |
| Variance | 219840577.6 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 10235 | 1 | < 0.1% | |
| 13644 | 1 | < 0.1% | |
| 46396 | 1 | < 0.1% | |
| 48445 | 1 | < 0.1% | |
| 42302 | 1 | < 0.1% | |
| 44351 | 1 | < 0.1% | |
| 21824 | 1 | < 0.1% | |
| 17730 | 1 | < 0.1% | |
| 30020 | 1 | < 0.1% | |
| 27975 | 1 | < 0.1% | |
| Other values (25642) | 25642 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 51302 | 1 | < 0.1% | |
| 51301 | 1 | < 0.1% | |
| 51300 | 1 | < 0.1% | |
| 51299 | 1 | < 0.1% | |
| 51298 | 1 | < 0.1% |
| Distinct | 867 |
|---|---|
| Distinct (%) | 3.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.4 KiB |
| unknown | |
|---|---|
| 8300 | 562 |
| 1180 | 471 |
| 1000 | 384 |
| 1050 | 352 |
| Other values (862) |
| Value | Count | Frequency (%) | |
| unknown | 11707 | 45.6% | |
| 8300 | 562 | 2.2% | |
| 1180 | 471 | 1.8% | |
| 1000 | 384 | 1.5% | |
| 1050 | 352 | 1.4% | |
| 9000 | 290 | 1.1% | |
| 8400 | 220 | 0.9% | |
| 4000 | 164 | 0.6% | |
| 1200 | 159 | 0.6% | |
| 1070 | 146 | 0.6% | |
| Other values (857) | 11197 | 43.6% |
Frequencies of value counts
Unique
| Unique | 117 ? |
|---|---|
| Unique (%) | 0.5% |
Histogram of lengths of the category
Length
| Max length | 7 |
|---|---|
| Median length | 4 |
| Mean length | 5.369133011 |
| Min length | 4 |
house_is
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.4 KiB |
| True | |
|---|---|
| False | |
| unknown |
| Value | Count | Frequency (%) | |
| True | 12339 | 48.1% | |
| False | 11007 | 42.9% | |
| unknown | 2306 | 9.0% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 7 |
|---|---|
| Median length | 5 |
| Mean length | 4.698775924 |
| Min length | 4 |
| Distinct | 178 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.4 KiB |
| HOUSE | |
|---|---|
| APARTMENT | |
| house | |
| apartment | |
| VILLA | |
| Other values (173) |
| Value | Count | Frequency (%) | |
| HOUSE | 7678 | 29.9% | |
| APARTMENT | 5000 | 19.5% | |
| house | 1787 | 7.0% | |
| apartment | 1766 | 6.9% | |
| VILLA | 1607 | 6.3% | |
| APARTMENT_BLOCK | 921 | 3.6% | |
| MIXED_USE_BUILDING | 865 | 3.4% | |
| Apartment | 639 | 2.5% | |
| PENTHOUSE | 454 | 1.8% | |
| DUPLEX | 439 | 1.7% | |
| Other values (168) | 4496 | 17.5% |
Frequencies of value counts
Unique
| Unique | 35 ? |
|---|---|
| Unique (%) | 0.1% |
Histogram of lengths of the category
Length
| Max length | 35 |
|---|---|
| Median length | 6 |
| Mean length | 7.842585373 |
| Min length | 1 |
price
Real number (ℝ≥0)
| Distinct | 2072 |
|---|---|
| Distinct (%) | 8.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 432368.0558 |
|---|---|
| Minimum | 0 |
| Maximum | 15000000 |
| Zeros | 33 |
| Zeros (%) | 0.1% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 190000 |
| median | 294983 |
| Q3 | 460000 |
| 95-th percentile | 1300000 |
| Maximum | 15000000 |
| Range | 15000000 |
| Interquartile range (IQR) | 270000 |
Descriptive statistics
| Standard deviation | 556720.9564 |
|---|---|
| Coefficient of variation (CV) | 1.287608899 |
| Kurtosis | 82.50475865 |
| Mean | 432368.0558 |
| Median Absolute Deviation (MAD) | 124017 |
| Skewness | 6.222075963 |
| Sum | 1.109110537e+10 |
| Variance | 3.099382233e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 772 | 3.0% | |
| 295000 | 305 | 1.2% | |
| 2 | 298 | 1.2% | |
| 275000 | 294 | 1.1% | |
| 249000 | 289 | 1.1% | |
| 199000 | 283 | 1.1% | |
| 395000 | 272 | 1.1% | |
| 225000 | 264 | 1.0% | |
| 299000 | 245 | 1.0% | |
| 349000 | 227 | 0.9% | |
| Other values (2062) | 22403 | 87.3% |
| Value | Count | Frequency (%) | |
| 0 | 33 | 0.1% | |
| 1 | 772 | 3.0% | |
| 2 | 298 | 1.2% | |
| 3 | 105 | 0.4% | |
| 4 | 81 | 0.3% |
| Value | Count | Frequency (%) | |
| 15000000 | 3 | < 0.1% | |
| 9500000 | 1 | < 0.1% | |
| 8750000 | 1 | < 0.1% | |
| 6732500 | 2 | < 0.1% | |
| 6700000 | 2 | < 0.1% |
sale
Categorical
| Distinct | 17 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.4 KiB |
| Unknown | |
|---|---|
| residential_sale | |
| unknown | |
| Apartment | |
| first_session_with_reserve_price | 297 |
| Other values (12) | 529 |
| Value | Count | Frequency (%) | |
| Unknown | 13883 | 54.1% | |
| residential_sale | 4750 | 18.5% | |
| unknown | 4726 | 18.4% | |
| Apartment | 1467 | 5.7% | |
| first_session_with_reserve_price | 297 | 1.2% | |
| Wohnung | 142 | 0.6% | |
| Public Sale | 89 | 0.3% | |
| Huis | 67 | 0.3% | |
| House | 63 | 0.2% | |
| Maison | 53 | 0.2% | |
| Other values (7) | 115 | 0.4% |
Frequencies of value counts
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Histogram of lengths of the category
Length
| Max length | 32 |
|---|---|
| Median length | 7 |
| Mean length | 9.10030407 |
| Min length | 4 |
| Distinct | 45 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 116 |
| Missing (%) | 0.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.421209273 |
|---|---|
| Minimum | 0 |
| Maximum | 204 |
| Zeros | 601 |
| Zeros (%) | 2.3% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 7 |
| Maximum | 204 |
| Range | 204 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 3.456006047 |
|---|---|
| Coefficient of variation (CV) | 1.010170899 |
| Kurtosis | 1231.973886 |
| Mean | 3.421209273 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 26.40591264 |
| Sum | 87364 |
| Variance | 11.9439778 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=45)
| Value | Count | Frequency (%) | |
| 3 | 7782 | 30.3% | |
| 2 | 6362 | 24.8% | |
| 4 | 4042 | 15.8% | |
| 5 | 2082 | 8.1% | |
| 1 | 1930 | 7.5% | |
| 6 | 1190 | 4.6% | |
| 0 | 601 | 2.3% | |
| 7 | 578 | 2.3% | |
| 8 | 331 | 1.3% | |
| 9 | 222 | 0.9% | |
| Other values (35) | 416 | 1.6% |
| Value | Count | Frequency (%) | |
| 0 | 601 | 2.3% | |
| 1 | 1930 | 7.5% | |
| 2 | 6362 | 24.8% | |
| 3 | 7782 | 30.3% | |
| 4 | 4042 | 15.8% |
| Value | Count | Frequency (%) | |
| 204 | 2 | < 0.1% | |
| 165 | 1 | < 0.1% | |
| 100 | 3 | < 0.1% | |
| 99 | 1 | < 0.1% | |
| 90 | 2 | < 0.1% |
| Distinct | 1664 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 1786 |
| Missing (%) | 7.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19345.44472 |
|---|---|
| Minimum | 0 |
| Maximum | 73500000 |
| Zeros | 983 |
| Zeros (%) | 3.8% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 32 |
| Q1 | 99 |
| median | 154 |
| Q3 | 254 |
| 95-th percentile | 1948 |
| Maximum | 73500000 |
| Range | 73500000 |
| Interquartile range (IQR) | 155 |
Descriptive statistics
| Standard deviation | 809853.4989 |
|---|---|
| Coefficient of variation (CV) | 41.86274912 |
| Kurtosis | 5094.106 |
| Mean | 19345.44472 |
| Median Absolute Deviation (MAD) | 66 |
| Skewness | 67.92246901 |
| Sum | 461698383.7 |
| Variance | 6.558626897e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 983 | 3.8% | |
| 120 | 419 | 1.6% | |
| 150 | 404 | 1.6% | |
| 100 | 376 | 1.5% | |
| 160 | 357 | 1.4% | |
| 90 | 343 | 1.3% | |
| 200 | 341 | 1.3% | |
| 140 | 331 | 1.3% | |
| 110 | 304 | 1.2% | |
| 80 | 298 | 1.2% | |
| Other values (1654) | 19710 | 76.8% | |
| (Missing) | 1786 | 7.0% |
| Value | Count | Frequency (%) | |
| 0 | 983 | 3.8% | |
| 1 | 29 | 0.1% | |
| 2 | 18 | 0.1% | |
| 3 | 13 | 0.1% | |
| 4 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 73500000 | 1 | < 0.1% | |
| 56074000 | 2 | < 0.1% | |
| 35000000 | 1 | < 0.1% | |
| 29000000 | 2 | < 0.1% | |
| 26488000 | 1 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2187 |
| Missing (%) | 8.5% |
| Memory size | 200.4 KiB |
| 1 | |
|---|---|
| 0 | |
| (Missing) |
| Value | Count | Frequency (%) | |
| 1 | 17557 | 68.4% | |
| 0 | 5908 | 23.0% | |
| (Missing) | 2187 | 8.5% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2832 |
| Missing (%) | 11.0% |
| Memory size | 200.4 KiB |
| 0 | |
|---|---|
| 1 | |
| (Missing) |
| Value | Count | Frequency (%) | |
| 0 | 19918 | 77.6% | |
| 1 | 2902 | 11.3% | |
| (Missing) | 2832 | 11.0% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2658 |
| Missing (%) | 10.4% |
| Memory size | 200.4 KiB |
| 0 | |
|---|---|
| 1 | 1383 |
| (Missing) |
| Value | Count | Frequency (%) | |
| 0 | 21611 | 84.2% | |
| 1 | 1383 | 5.4% | |
| (Missing) | 2658 | 10.4% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 6534 |
| Missing (%) | 25.5% |
| Memory size | 200.4 KiB |
| 1 | |
|---|---|
| 0 | |
| (Missing) |
| Value | Count | Frequency (%) | |
| 1 | 11058 | 43.1% | |
| 0 | 8060 | 31.4% | |
| (Missing) | 6534 | 25.5% |
| Distinct | 161 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 11427 |
| Missing (%) | 44.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.92674868 |
|---|---|
| Minimum | 0 |
| Maximum | 3749 |
| Zeros | 6711 |
| Zeros (%) | 26.2% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 20 |
| 95-th percentile | 60 |
| Maximum | 3749 |
| Range | 3749 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 42.25307551 |
|---|---|
| Coefficient of variation (CV) | 2.830695177 |
| Kurtosis | 4302.958711 |
| Mean | 14.92674868 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 50.34765983 |
| Sum | 212333 |
| Variance | 1785.32239 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 6711 | 26.2% | |
| 20 | 450 | 1.8% | |
| 10 | 401 | 1.6% | |
| 15 | 365 | 1.4% | |
| 8 | 330 | 1.3% | |
| 6 | 309 | 1.2% | |
| 30 | 294 | 1.1% | |
| 12 | 286 | 1.1% | |
| 25 | 266 | 1.0% | |
| 40 | 241 | 0.9% | |
| Other values (151) | 4572 | 17.8% | |
| (Missing) | 11427 | 44.5% |
| Value | Count | Frequency (%) | |
| 0 | 6711 | 26.2% | |
| 1 | 32 | 0.1% | |
| 2 | 136 | 0.5% | |
| 3 | 175 | 0.7% | |
| 4 | 213 | 0.8% |
| Value | Count | Frequency (%) | |
| 3749 | 1 | < 0.1% | |
| 708 | 1 | < 0.1% | |
| 584 | 1 | < 0.1% | |
| 495 | 1 | < 0.1% | |
| 450 | 3 | < 0.1% |
| Distinct | 120 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 3685 |
| Missing (%) | 14.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.98520508 |
|---|---|
| Minimum | 0 |
| Maximum | 3749 |
| Zeros | 15743 |
| Zeros (%) | 61.4% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 15 |
| Maximum | 3749 |
| Range | 3749 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 29.48523751 |
|---|---|
| Coefficient of variation (CV) | 9.877122917 |
| Kurtosis | 11890.21581 |
| Mean | 2.98520508 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 95.45319567 |
| Sum | 65576 |
| Variance | 869.3792312 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 15743 | 61.4% | |
| 1 | 4654 | 18.1% | |
| 20 | 133 | 0.5% | |
| 30 | 103 | 0.4% | |
| 15 | 97 | 0.4% | |
| 25 | 83 | 0.3% | |
| 10 | 73 | 0.3% | |
| 40 | 73 | 0.3% | |
| 50 | 69 | 0.3% | |
| 12 | 51 | 0.2% | |
| Other values (110) | 888 | 3.5% | |
| (Missing) | 3685 | 14.4% |
| Value | Count | Frequency (%) | |
| 0 | 15743 | 61.4% | |
| 1 | 4654 | 18.1% | |
| 2 | 12 | < 0.1% | |
| 3 | 11 | < 0.1% | |
| 4 | 32 | 0.1% |
| Value | Count | Frequency (%) | |
| 3749 | 1 | < 0.1% | |
| 708 | 1 | < 0.1% | |
| 450 | 2 | < 0.1% | |
| 400 | 2 | < 0.1% | |
| 350 | 2 | < 0.1% |
| Distinct | 731 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 10074 |
| Missing (%) | 39.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 223.2714084 |
|---|---|
| Minimum | 0 |
| Maximum | 94000 |
| Zeros | 11761 |
| Zeros (%) | 45.8% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 800 |
| Maximum | 94000 |
| Range | 94000 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2014.178429 |
|---|---|
| Coefficient of variation (CV) | 9.021210745 |
| Kurtosis | 876.3142327 |
| Mean | 223.2714084 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 26.60671719 |
| Sum | 3478122 |
| Variance | 4056914.742 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 11761 | 45.8% | |
| 1 | 239 | 0.9% | |
| 100 | 127 | 0.5% | |
| 50 | 91 | 0.4% | |
| 200 | 89 | 0.3% | |
| 300 | 87 | 0.3% | |
| 500 | 73 | 0.3% | |
| 150 | 67 | 0.3% | |
| 30 | 64 | 0.2% | |
| 60 | 62 | 0.2% | |
| Other values (721) | 2918 | 11.4% | |
| (Missing) | 10074 | 39.3% |
| Value | Count | Frequency (%) | |
| 0 | 11761 | 45.8% | |
| 1 | 239 | 0.9% | |
| 2 | 1 | < 0.1% | |
| 3 | 2 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 94000 | 1 | < 0.1% | |
| 75000 | 2 | < 0.1% | |
| 63000 | 2 | < 0.1% | |
| 58000 | 1 | < 0.1% | |
| 55000 | 2 | < 0.1% |
| Distinct | 1831 |
|---|---|
| Distinct (%) | 9.4% |
| Missing | 6269 |
| Missing (%) | 24.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 690.7379147 |
|---|---|
| Minimum | 0 |
| Maximum | 1379000 |
| Zeros | 11203 |
| Zeros (%) | 43.7% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 301 |
| 95-th percentile | 2103.6 |
| Maximum | 1379000 |
| Range | 1379000 |
| Interquartile range (IQR) | 301 |
Descriptive statistics
| Standard deviation | 10619.48957 |
|---|---|
| Coefficient of variation (CV) | 15.37412287 |
| Kurtosis | 14651.97134 |
| Mean | 690.7379147 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 113.7686367 |
| Sum | 13388573 |
| Variance | 112773558.7 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 11203 | 43.7% | |
| 100 | 104 | 0.4% | |
| 150 | 92 | 0.4% | |
| 300 | 81 | 0.3% | |
| 200 | 75 | 0.3% | |
| 400 | 69 | 0.3% | |
| 120 | 65 | 0.3% | |
| 250 | 62 | 0.2% | |
| 1000 | 61 | 0.2% | |
| 50 | 52 | 0.2% | |
| Other values (1821) | 7519 | 29.3% | |
| (Missing) | 6269 | 24.4% |
| Value | Count | Frequency (%) | |
| 0 | 11203 | 43.7% | |
| 1 | 42 | 0.2% | |
| 2 | 2 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1379000 | 1 | < 0.1% | |
| 150000 | 2 | < 0.1% | |
| 117800 | 2 | < 0.1% | |
| 110000 | 1 | < 0.1% | |
| 103553 | 1 | < 0.1% |
| Distinct | 2867 |
|---|---|
| Distinct (%) | 16.6% |
| Missing | 8371 |
| Missing (%) | 32.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8845478.135 |
|---|---|
| Minimum | 0 |
| Maximum | 1.35e+10 |
| Zeros | 911 |
| Zeros (%) | 3.6% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 99 |
| median | 255 |
| Q3 | 970 |
| 95-th percentile | 2150000 |
| Maximum | 1.35e+10 |
| Range | 1.35e+10 |
| Interquartile range (IQR) | 871 |
Descriptive statistics
| Standard deviation | 260663245.9 |
|---|---|
| Coefficient of variation (CV) | 29.46853092 |
| Kurtosis | 1631.754895 |
| Mean | 8845478.135 |
| Median Absolute Deviation (MAD) | 204 |
| Skewness | 38.43006696 |
| Sum | 1.528587076e+11 |
| Variance | 6.794532779e+16 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 911 | 3.6% | |
| 100 | 169 | 0.7% | |
| 90 | 163 | 0.6% | |
| 80 | 144 | 0.6% | |
| 70 | 141 | 0.5% | |
| 110 | 139 | 0.5% | |
| 120 | 139 | 0.5% | |
| 150 | 131 | 0.5% | |
| 200 | 110 | 0.4% | |
| 85 | 110 | 0.4% | |
| Other values (2857) | 15124 | 59.0% | |
| (Missing) | 8371 | 32.6% |
| Value | Count | Frequency (%) | |
| 0 | 911 | 3.6% | |
| 1 | 32 | 0.1% | |
| 1.28 | 1 | < 0.1% | |
| 1.64 | 1 | < 0.1% | |
| 1.77 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1.35e+10 | 1 | < 0.1% | |
| 1.3e+10 | 1 | < 0.1% | |
| 1.28e+10 | 1 | < 0.1% | |
| 1.18e+10 | 1 | < 0.1% | |
| 8100000000 | 1 | < 0.1% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5580 |
| Missing (%) | 21.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.625548027 |
|---|---|
| Minimum | 0 |
| Maximum | 10 |
| Zeros | 8610 |
| Zeros (%) | 33.6% |
| Memory size | 200.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.559150754 |
|---|---|
| Coefficient of variation (CV) | 0.9591539146 |
| Kurtosis | -1.376505337 |
| Mean | 1.625548027 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.2495757249 |
| Sum | 32628 |
| Variance | 2.430951072 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) | |
| 0 | 8610 | 33.6% | |
| 2 | 5066 | 19.7% | |
| 4 | 3548 | 13.8% | |
| 3 | 2719 | 10.6% | |
| 1 | 127 | 0.5% | |
| 10 | 2 | < 0.1% | |
| (Missing) | 5580 | 21.8% |
| Value | Count | Frequency (%) | |
| 0 | 8610 | 33.6% | |
| 1 | 127 | 0.5% | |
| 2 | 5066 | 19.7% | |
| 3 | 2719 | 10.6% | |
| 4 | 3548 | 13.8% |
| Value | Count | Frequency (%) | |
| 10 | 2 | < 0.1% | |
| 4 | 3548 | 13.8% | |
| 3 | 2719 | 10.6% | |
| 2 | 5066 | 19.7% | |
| 1 | 127 | 0.5% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2971 |
| Missing (%) | 11.6% |
| Memory size | 200.4 KiB |
| 0 | |
|---|---|
| 1 | 987 |
| (Missing) |
| Value | Count | Frequency (%) | |
| 0 | 21694 | 84.6% | |
| 1 | 987 | 3.8% | |
| (Missing) | 2971 | 11.6% |
building_state
Categorical
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.4 KiB |
| Not specified | |
|---|---|
| AS_NEW | |
| GOOD | |
| TO_BE_DONE_UP | 1037 |
| TO_RENOVATE | 892 |
| Other values (4) | 1308 |
| Value | Count | Frequency (%) | |
| Not specified | 13146 | 51.2% | |
| AS_NEW | 5466 | 21.3% | |
| GOOD | 3803 | 14.8% | |
| TO_BE_DONE_UP | 1037 | 4.0% | |
| TO_RENOVATE | 892 | 3.5% | |
| JUST_RENOVATED | 806 | 3.1% | |
| old | 240 | 0.9% | |
| New | 198 | 0.8% | |
| TO_RESTORE | 64 | 0.2% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 14 |
|---|---|
| Median length | 13 |
| Mean length | 9.95778107 |
| Min length | 3 |
region
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 200.4 KiB |
| unknown | |
|---|---|
| Flanders | |
| Wallonia | |
| Brussels |
| Value | Count | Frequency (%) | |
| unknown | 11707 | 45.6% | |
| Flanders | 7189 | 28.0% | |
| Wallonia | 4223 | 16.5% | |
| Brussels | 2533 | 9.9% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.54362233 |
| Min length | 7 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | Unnamed: 0 | locality | house_is | property_subtype | price | sale | rooms_number | area | kitchen_has | furnished | open_fire | terrace | terrace_area | garden | garden_area | land_surface | land_plot_surface | facades_number | swimming_pool_has | building_state | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 32349 | 32349 | 1380 | True | HOUSE | 1650000.0 | Unknown | 4.0 | 350.0 | 1.0 | 0.0 | 1.0 | 0.0 | NaN | 0.0 | 0.0 | 4040.0 | NaN | 4.0 | 0.0 | AS_NEW | Wallonia |
| 1 | 12426 | 12426 | 1970 | False | APARTMENT | 332956.0 | Unknown | 2.0 | 101.0 | 1.0 | 0.0 | 0.0 | 1.0 | 18.0 | 0.0 | 0.0 | 0.0 | 101.0 | 0.0 | 0.0 | Not specified | Flanders |
| 2 | 12185 | 12185 | 8400 | False | APARTMENT | 159000.0 | Unknown | 2.0 | 90.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 90.0 | 0.0 | 0.0 | TO_BE_DONE_UP | Flanders |
| 3 | 26760 | 26760 | unknown | True | villa | 230000.0 | unknown | 3.0 | 110.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 1.0 | 700.0 | NaN | 824.0 | 4.0 | 0.0 | Not specified | unknown |
| 4 | 16985 | 16985 | 9040 | False | APARTMENT | 195000.0 | Unknown | 2.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Not specified | Flanders |
| 5 | 29483 | 29483 | unknown | False | apartment | 275000.0 | unknown | 1.0 | 99.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | NaN | NaN | NaN | 2.0 | 0.0 | Not specified | unknown |
| 6 | 5337 | 5337 | 6637 | True | HOUSE | 210000.0 | Unknown | 4.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1600.0 | 1600.0 | 0.0 | 0.0 | Not specified | Wallonia |
| 7 | 50777 | 50777 | unknown | False | APARTMENT | 895000.0 | residential_sale | 3.0 | 227.0 | 1.0 | NaN | 0.0 | NaN | NaN | 50.0 | NaN | 0.0 | NaN | 3.0 | NaN | AS_NEW | unknown |
| 8 | 8162 | 8162 | 4530 | True | HOUSE | 265000.0 | Unknown | 3.0 | 185.0 | 1.0 | 0.0 | 0.0 | 1.0 | 30.0 | 1.0 | 170.0 | 0.0 | 185.0 | 0.0 | 0.0 | Not specified | Wallonia |
| 9 | 31178 | 31178 | 1140 | True | HOUSE | 850000.0 | Unknown | 5.0 | 305.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | 0.0 | 500.0 | NaN | 2.0 | 0.0 | GOOD | Brussels |
Last rows
| df_index | Unnamed: 0 | locality | house_is | property_subtype | price | sale | rooms_number | area | kitchen_has | furnished | open_fire | terrace | terrace_area | garden | garden_area | land_surface | land_plot_surface | facades_number | swimming_pool_has | building_state | region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 25642 | 30066 | 30066 | unknown | True | house | 248000.0 | unknown | 3.0 | 216.00 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | NaN | NaN | 780.0 | 3.0 | 0.0 | Not specified | unknown |
| 25643 | 47148 | 47148 | unknown | False | APARTMENT_BLOCK | 199000.0 | residential_sale | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 220.0 | 2.0 | NaN | Not specified | unknown |
| 25644 | 30795 | 30795 | unknown | True | house | 530000.0 | unknown | 3.0 | 200.00 | 1.0 | 0.0 | 0.0 | 1.0 | 25.0 | 1.0 | NaN | NaN | 452.0 | 3.0 | 0.0 | Not specified | unknown |
| 25645 | 22842 | 22842 | unknown | False | ground-floor | 119999.0 | unknown | 1.0 | 50.00 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 0.0 | NaN | NaN | NaN | NaN | 0.0 | Not specified | unknown |
| 25646 | 31588 | 31588 | 2400 | True | HOUSE | 325000.0 | Unknown | 3.0 | 180.00 | 1.0 | 0.0 | 0.0 | 1.0 | 28.0 | 1.0 | 77.0 | 200.0 | NaN | 2.0 | 0.0 | AS_NEW | Flanders |
| 25647 | 18108 | 18108 | unknown | False | Apartment | 6.0 | Apartment | 8.0 | 6350.71 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 6800000.0 | NaN | 0.0 | Not specified | unknown |
| 25648 | 19270 | 19270 | unknown | False | Apartment | 2.0 | Apartment | 6.0 | 1883.68 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | NaN | 2450000.0 | NaN | 0.0 | Not specified | unknown |
| 25649 | 51075 | 51075 | unknown | False | PENTHOUSE | 1396000.0 | residential_sale | 3.0 | 202.00 | 1.0 | NaN | 0.0 | NaN | NaN | 157.0 | NaN | 0.0 | NaN | NaN | 0.0 | AS_NEW | unknown |
| 25650 | 10129 | 10129 | 8400 | False | APARTMENT | 129000.0 | Unknown | 2.0 | 66.00 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 66.0 | 0.0 | 0.0 | GOOD | Flanders |
| 25651 | 11940 | 11940 | 1040 | False | APARTMENT | 475000.0 | Unknown | 2.0 | 175.00 | 1.0 | 0.0 | 0.0 | 1.0 | 25.0 | 0.0 | 0.0 | 0.0 | 7.0 | 0.0 | 0.0 | AS_NEW | Brussels |